Conversation
|
Reviewing this against the updated #4247 plan. Right shape for PR 1 — owner-only canonical writes + 1.
|
…ns test Address review feedback from @EmmaLouise2018 on PR #4250: 1. `verdict_source` field on /api/registry/agents/:url/compliance — `AgentComplianceDetailSchema` gains optional `verdict_source`: 'heartbeat' | 'owner_test' | 'manual' | 'webhook' | null — `getComplianceStatus` and `bulkGetComplianceStatus` join `agent_compliance_runs` via LATERAL subquery (dry_run=false, ORDER BY tested_at DESC LIMIT 1) to surface the triggered_by of the most recent run. No migration needed. — Endpoint response emits `verdict_source: status.last_triggered_by`. — `AgentComplianceStatus` interface gets `last_triggered_by` field. 2. Last-write-wins contract test — New `compliance-db-last-write-wins.test.ts` pins the ON CONFLICT DO UPDATE semantics: every recordComplianceRun call overwrites agent_compliance_status regardless of triggered_by source. A future change to first-write-wins or priority ordering would break these tests. https://claude.ai/code/session_01NVVqgeSGevUGXgDbMw1XKZ
|
Both asks addressed in the fixup commit ( 1.
verdict_source: 'heartbeat' | 'owner_test' | 'manual' | 'webhook' | null
2. Last-write-wins contract test New
A future switch to first-write-wins or source-priority filtering breaks these tests. All 3302 unit tests pass. Session: https://claude.ai/code/session_01NVVqgeSGevUGXgDbMw1XKZ Generated by Claude Code |
|
Code review (expert pass): solid root, ready to merge after stripping committed Block:
Nits (non-blocking):
After dist strip, mark ready and the chain unblocks (#4263 is clean too — see #4264 for the real chain blocker). |
|
Blocker addressed — pushed
Nits noted (not fixed):
Ready to mark out of draft when you are. Triaged by Claude Code. Session: https://claude.ai/code/session_01WrFMahjaHU7y4JWqPqxTbx Generated by Claude Code |
…ns test Address review feedback from @EmmaLouise2018 on PR #4250: 1. `verdict_source` field on /api/registry/agents/:url/compliance — `AgentComplianceDetailSchema` gains optional `verdict_source`: 'heartbeat' | 'owner_test' | 'manual' | 'webhook' | null — `getComplianceStatus` and `bulkGetComplianceStatus` join `agent_compliance_runs` via LATERAL subquery (dry_run=false, ORDER BY tested_at DESC LIMIT 1) to surface the triggered_by of the most recent run. No migration needed. — Endpoint response emits `verdict_source: status.last_triggered_by`. — `AgentComplianceStatus` interface gets `last_triggered_by` field. 2. Last-write-wins contract test — New `compliance-db-last-write-wins.test.ts` pins the ON CONFLICT DO UPDATE semantics: every recordComplianceRun call overwrites agent_compliance_status regardless of triggered_by source. A future change to first-write-wins or priority ordering would break these tests. https://claude.ai/code/session_01NVVqgeSGevUGXgDbMw1XKZ
c71809a to
42e7f37
Compare
PR 2 of the #4247 unification stack. Reads two fields PR #4250 added to the compliance API but the dashboard wasn't yet rendering: - compliance tile: appends "(your test)" / "(heartbeat)" / "(manual)" / "(webhook)" after Last checked, so operators see whether the current verdict came from their own evaluate_agent_quality run or the scheduled heartbeat. - history panel: per-run badge with the same source label, info-blue for owner_test and neutral for the rest. Pre-PR-1 rows render with neutral — no regression. No backend changes; pure UI surfacing of fields already in the API. Stacked on PR #4250.
…o keys (#4364) * fix(compliance): rewrite deriveStoryboardStatuses for SDK 6.x scenario keys The compliance heartbeat has been writing zero rows to agent_storyboard_status since the SDK switched comply() to storyboard- driven testing. The SDK emits one TestResult per phase of each storyboard, keyed `<storyboard_id>/<phase_id>` in result.tracks[].scenarios[].scenario (see @adcp/sdk compliance/storyboard-tracks.ts). The old implementation walked the YAML's per-step `comply_scenario` field (bare names like `signals_flow`, `capability_discovery`) and looked them up in the SDK's scenario map. Every lookup missed → testedCount === 0 → every storyboard skipped at the `continue` guard. Effect across the registry: agent_storyboard_status total rows: 6 (across 4 agents) rows written by triggered_by='heartbeat': 0 rows surviving were legacy bare-name keys from old manual runs This silently broke the AAO Verified badge pipeline (no storyboard rows → deriveVerificationStatus has nothing to verify against) and every agent's dashboard `storyboards_passing: 0 / N` was misleading: the runner wasn't failing storyboards, the parser was dropping them. Surfaced by escalation #329: Evgeny's agent was running 30/30 scenarios clean but showing `degraded` because specialism_status.signal-owned read 'untested' from a never-populated agent_storyboard_status row. Fix: read SDK output directly. Group scenarios by storyboard id, roll per-step pass counts up from each phase's `steps` array, fall back to phase-level counts when steps are absent. The `storyboardIds` override is preserved for explicit-IDs callers that need an `untested` entry when the runner didn't run a requested storyboard. The unused YAML `comply_scenario` field is no longer load-bearing for status mapping (the SDK already knows which storyboards it ran). Tests: 9 cases covering all-pass, partial, all-fail, phase-only fallback, legacy bare-name skip, empty input, and explicit-IDs untested gap. Stack note: this is orthogonal to Emma's #4247 compliance-state unification stack (#4250, #4263, #4264, #4268, #4274) which collapses agent_test_history into agent_compliance_runs. Different files; rebases cleanly in either order. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(scripts): test-comply-storyboard-statuses — local harness for the fix Runs comply() against an agent URL and prints what deriveStoryboardStatuses would produce, without DB writes. Used to validate the SDK-6.x scenario-key fix against real agents (adcp-signals-adaptor.evgeny-193.workers.dev/mcp and wonderstruck.sales-agent.scope3.com/mcp) before merging. Will stay useful for future SDK upgrades that touch scenario emission or storyboard-track aggregation — same pattern as the diagnose-agent-comply-queue script from #4361. Usage: npx tsx server/src/scripts/test-comply-storyboard-statuses.ts <agent-url> [<agent-url> ...] Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(compliance): code review nits — clarify steps doc, hoist explicit-ids check, add 3 edge tests Addresses code-reviewer feedback on PR #4364: - JSDoc on deriveStoryboardStatuses now calls out that steps_passed/total are not directly comparable across rows (some rows are real step counts, some are phase-level fallbacks when the SDK omits per-step data). - Comment pinning the storyboard-id invariant (flat ids, no `/`) so the indexOf split stays correct as new storyboards land. - Defensive `result.tracks ?? []` so a malformed result doesn't throw. - Hoist `storyboardIds && length > 0` into a single `hasExplicitIds` const used at both the toEmit decision and the no-data fallback. - Three new test cases: * same storyboard split across multiple tracks aggregates correctly * result.tracks absent → [] * non-string scenario values (null, number) → skipped without throwing 12/12 vitest passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Heads up — PR #4364 just merged to main, which fixes Without #4364 underneath, this PR's owner-test write path would have inherited the same bug — owner runs from Addie would have written canonical With #4364 in main: owner-test writes will produce real per-storyboard data on first run after this PR ships. Worth a rebase before merge so CI exercises both fixes together. Linking #4364 here for the chain: #4364 → #4250 → #4263 → #4264 → #4268 → #4274. |
…ce state Closes the 12-hour gap between owner-triggered storyboard runs and the public /api/registry/agents/:url/compliance endpoint (issue #4247, PR 1 of 4). When evaluate_agent_quality is triggered by the agent owner, the result is now written to agent_compliance_status + agent_compliance_runs + agent_storyboard_status with triggered_by = 'owner_test'. Non-owner runs continue writing to agent_test_history (deprecated in PR 3). Migration 471 adds 'owner_test' to both triggered_by CHECK constraints. notifyComplianceChange is intentionally suppressed for owner runs to prevent iteration-loop Slack spam. https://claude.ai/code/session_01UNHkGhBXk9XD2dpzvSLdhb
…ns test Address review feedback from @EmmaLouise2018 on PR #4250: 1. `verdict_source` field on /api/registry/agents/:url/compliance — `AgentComplianceDetailSchema` gains optional `verdict_source`: 'heartbeat' | 'owner_test' | 'manual' | 'webhook' | null — `getComplianceStatus` and `bulkGetComplianceStatus` join `agent_compliance_runs` via LATERAL subquery (dry_run=false, ORDER BY tested_at DESC LIMIT 1) to surface the triggered_by of the most recent run. No migration needed. — Endpoint response emits `verdict_source: status.last_triggered_by`. — `AgentComplianceStatus` interface gets `last_triggered_by` field. 2. Last-write-wins contract test — New `compliance-db-last-write-wins.test.ts` pins the ON CONFLICT DO UPDATE semantics: every recordComplianceRun call overwrites agent_compliance_status regardless of triggered_by source. A future change to first-write-wins or priority ordering would break these tests. https://claude.ai/code/session_01NVVqgeSGevUGXgDbMw1XKZ
…rtifacts
Generated JS/TS files don't belong in source control. Also adds
.gitignore rules for dist/schemas/*.{js,d.ts,*.map} to prevent recurrence.
https://claude.ai/code/session_01WrFMahjaHU7y4JWqPqxTbx
…tion_queue on main)
54a4f16 to
f7f933b
Compare
Six changes responding to expert review: 1. Renumber migration 472 → 475 (collision with 472_drop_member_profiles_ primary_brand_domain.sql on main since this PR was opened). 2. Drop `verdict_source` from the public /api/registry/agents/:url/compliance response. Heartbeat and owner_test both call comply() against the same registered URL with the same owner-saved credentials; the verdict's truth content is identical regardless of who pulled the trigger. Surfacing the source label to buyers creates a trust distinction the underlying observation doesn't carry. `triggered_by` stays as internal audit on agent_compliance_runs; Emma's #4263 dashboard surface keeps the operator UX cue. 3. Drop the legacy `recordComplianceRun(... 'manual')` write inside evaluate_agent_quality. Pre-PR, that write was invisible publicly. Post-PR (with verdict_source on the response), it would have leaked `verdict_source: 'manual'` for non-owner runs — security-reviewer's B1. Even with verdict_source removed (item 2), the legacy write was gated only on `agent_contexts` row existence — which `save_agent` lets any org create for any URL without an ownership check — so a non-owner could publish a verdict on someone else's agent. The owner-test branch covers the dashboard-freshness use case with a real ownership check; the legacy public-state write has no remaining function. Legacy agent_test_history write retained as session-scoped audit until Emma's PR 3 cleans up. 4. Dashboard "Run this storyboard" endpoint now writes `triggered_by = 'owner_test'` instead of `'manual'` (owner-only path, consistent with the new semantic). Legacy `'manual'` value remains in the enum for historical rows. 5. Rate limit on evaluate_agent_quality via existing Addie tool rate limiter (default 60/10min). comply() itself takes 10-60s per run so the natural ceiling is already ~1-2/min — this adds a hard wall above that. Security-reviewer's B2. 6. Updated changeset content + migration number references; replaces stale "migration 471" prose from earlier renumber. Out of scope (deferred follow-ups): - Badge issuance on owner_test transitions: owner currently waits up to 1h for next heartbeat to re-issue badge. Skipped because the badge fan-out in compliance-heartbeat.ts is ~70 lines and extracting/sharing is its own refactor. - Extracting resolveAgentOwnerOrg to a shared helper: same drift-surface call applies, kept inline to bound this PR's scope. - Active-membership filter: organization_memberships has no status column (row deletion = removal), so the security-review M2 concern doesn't apply at the schema layer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Review iteration — addressed security + correctness feedbackPushed Blockers
Trust-model fix
Consistency fix
Out of scope — follow-ups, not part of this PR
N/A
Ready for re-review. |
Re-review (post-828ea1d)Two experts (code-reviewer + security-reviewer) reviewed the updated diff. No ship-blockers. Confirmed addressed:
Nits (non-blocking, your call):
Triaged by Claude Code. Session: https://claude.ai/code/session_014mrjryUmFVVrB2BzDhgMZR Generated by Claude Code |
PR 2 of the #4247 unification stack. Reads two fields PR #4250 added to the compliance API but the dashboard wasn't yet rendering: - compliance tile: appends "(your test)" / "(heartbeat)" / "(manual)" / "(webhook)" after Last checked, so operators see whether the current verdict came from their own evaluate_agent_quality run or the scheduled heartbeat. - history panel: per-run badge with the same source label, info-blue for owner_test and neutral for the rest. Pre-PR-1 rows render with neutral — no regression. No backend changes; pure UI surfacing of fields already in the API. Stacked on PR #4250.
PR 2 of the #4247 unification stack. Reads two fields PR #4250 added to the compliance API but the dashboard wasn't yet rendering: - compliance tile: appends "(your test)" / "(heartbeat)" / "(manual)" / "(webhook)" after Last checked, so operators see whether the current verdict came from their own evaluate_agent_quality run or the scheduled heartbeat. - history panel: per-run badge with the same source label, info-blue for owner_test and neutral for the rest. Pre-PR-1 rows render with neutral — no regression. No backend changes; pure UI surfacing of fields already in the API. Stacked on PR #4250.
…4263) PR 2 of the #4247 unification stack. Reads two fields PR #4250 added to the compliance API but the dashboard wasn't yet rendering: - compliance tile: appends "(your test)" / "(heartbeat)" / "(manual)" / "(webhook)" after Last checked, so operators see whether the current verdict came from their own evaluate_agent_quality run or the scheduled heartbeat. - history panel: per-run badge with the same source label, info-blue for owner_test and neutral for the rest. Pre-PR-1 rows render with neutral — no regression. No backend changes; pure UI surfacing of fields already in the API. Stacked on PR #4250.
closes #4377) (#4388) PR #4250's review flagged the agent-ownership JOIN as a drift surface — duplicated inline in two places with subtly different semantics: - registry-api.ts:4825 `resolveAgentOwnerOrg(userId, agentUrl)` — "any org the user is a member of that owns the agent" - member-tools.ts:3588 (inline) — "the resolved member-context org IS the owning org" (tighter predicate that adds the org_id constraint) Both queries join `member_profiles.agents` against `organization_ memberships` — same canonical relation, different selectivity. This PR extracts both to `server/src/services/agent-ownership.ts`: - `findOwnerOrgForUser(userId, agentUrl): Promise<string | null>` — registry-api.ts's "discover ownership" use case - `isOrgOwnerOfAgent(orgId, userId, agentUrl): Promise<boolean>` — member-tools.ts's "confirm specific org" use case Both call sites updated to use the helpers. `resolveAgentOwnerOrg` is retained as a closure-scoped alias inside the registry-api factory so existing call sites don't need to thread the import. Note on active-membership filtering: `organization_memberships` has no status column in this schema — removed members get their row deleted, not status-flipped. Row existence is the membership signal. Documented in the helper file's module doc. Tests (7 passing): - findOwnerOrgForUser returns org_id, null on no match, null on throw - isOrgOwnerOfAgent returns true/false correctly - semantic distinction pinned: findOwnerOrgForUser uses 2 params (no org filter); isOrgOwnerOfAgent uses 3 params (org_id constraint) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Refs #4247 — PR 1 of 4 in the compliance-state unification initiative.
Summary
Closes the 12-hour gap between an owner running
evaluate_agent_quality(via Addie chat or thetest_adcp_agentMCP tool) and the public/api/registry/agents/:url/complianceendpoint reflecting the result.Root cause:
evaluate_agent_qualitywrote toagent_test_history/agent_contexts.last_test_*(not visible to the compliance API). The heartbeat is the only prior writer to the canonical tables, so owner-triggered results were invisible until the next 12-hour cron.What this PR does (owner path only):
member_profiles.agents @> $1::jsonb JOIN organization_memberships— identical toresolveAgentOwnerOrginregistry-api.ts:4733.compliance_opt_outis not set: callscomplianceResultToDbInput()+complianceDb.recordComplianceRun()withtriggered_by = 'owner_test'anddry_run = false.'owner_test'to theTriggeredBytype and bothtriggered_byCHECK constraints (migration 471 — coversagent_compliance_runsandagent_storyboard_status).notifyComplianceChangeto prevent Slack spam on iterative owner test runs.agentContextDb.recordTest()write for backward compatibility (deprecated in PR 3).Known gap documented for follow-up: AAO Verified badge state is still updated only by the heartbeat (badge issuance calls
processAgentBadgeswhich is in the heartbeat loop, not inrecordComplianceRun). Compliance status reflects owner test results immediately; badge re-issuance still requires the next heartbeat. Tracked in #4247.Non-breaking justification: Adds an optional write path to existing canonical tables with a new
triggered_by = 'owner_test'value. No fields removed, no types changed, no existing writes modified. The new enum value is additive; the constraint drop-and-recreate is non-destructive DDL. Existing consumers of the compliance API gain data, not schema changes.Pre-PR review
organization_memberships,compliance_opt_outguard added,dry_run: falsecomment added, log level on owner-check failure raised towarnmember_profiles.agents @> $1::jsonb) matches established codebase pattern;notifyComplianceChangesuppression is correct per notification subscriber scope (Slack + external consumers via Scope3); badge-issuance gap acknowledged as follow-upSession: https://claude.ai/code/session_01UNHkGhBXk9XD2dpzvSLdhb
Generated by Claude Code